Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes
نویسندگان
چکیده
We consider Maximal Clique Enumeration (MCE) from a large graph. A maximal clique is perhaps the most fundamental dense substructure in a graph, and MCE is an important tool to discover densely connected subgraphs, with numerous applications to data mining on web graphs, social networks, and biological networks. While effective sequential methods for MCE are known, scalable parallel methods for MCE are still lacking. We present a new parallel algorithm for MCE, Parallel Enumeration of Cliques using Ordering (PECO), designed for the MapReduce framework. Unlike previous works, which required a post-processing step to remove duplicate and non-maximal cliques, PECO enumerates only maximal cliques with no duplicates. The key technical ingredient is a total ordering of the vertices of the graph which is used in a novel way to achieve a load balanced distribution of work, and to eliminate redundant work among processors. We implemented PECO on Hadoop MapReduce, and our experiments on a cluster show that the algorithm can effectively process a variety of large real-world graphs with millions of vertices and tens of millions of maximal cliques, and scales well with the degree of available parallelism.
منابع مشابه
School of IT Technical Report DATA PREPARATION FOR MINING COMPLEX PATTERNS IN LARGE SPATIAL DATABASES
The aim of the thesis is to design an efficient algorithm for data preparation in large spatial databases for the purpose of data mining. With respect to finding complex spatial patterns, the raw data needs to be in the form converted into a set of cliques. In our case the raw data was a 1% sample from the Sloane Digital Sky Survey database which contains 818 Gigabytes of astronomical informati...
متن کاملMining λ-Maximal Cliques from a Fuzzy Graph
The depletion of natural resources in the last century now threatens our planet and the life of future generations. For the sake of sustainable development, this paper pioneers an interesting and practical problem of dense substructure (i.e., maximal cliques) mining in a fuzzy graph where the edges are weighted by the degree of membership. For parameter 0 ≤ λ ≤ 1 (also called fuzzy cut in fuzzy...
متن کاملInteraction graph mining for protein complexes using local clique merging.
While recent technological advances have made available large datasets of experimentally-detected pairwise protein-protein interactions, there is still a lack of experimentally-determined protein complex data. To make up for this lack of protein complex data, we explore the mining of existing protein interaction graphs for protein complexes. This paper proposes a novel graph mining algorithm to...
متن کاملArabesque: A System for Distributed Graph Mining - Extended version
Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very la...
متن کاملConstraint-Based Mining of Sets of Cliques Sharing Vertex Properties
We consider data mining methods on large graphs where a set of labels is associated to each vertex. A typical example of such graphs is a social network of collaborating researchers where additional information represent the main publication targets (preferred conferences or journals) for each author. We investigate the extraction of sets of dense subgraphs such that the vertices in all subgrap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 79-80 شماره
صفحات -
تاریخ انتشار 2015